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Abstract 

The spatial preferential attachment (SPA) is a model for complex networks. 

In the SPA model, nodes are embedded in a metric space, and each node has a 
sphere of influence whose size increases if the node gains an in-link, and otherwise 
decreases with time. In this paper, we study the behaviour of the SPA model when 
the distribution of the nodes is non-uniform. Specifically, the space is divided into 
dense and sparse regions, where it is assumed that the dense regions correspond to 
coherent communities. We prove precise theoretical results regarding the degree 
of a node, the number of common neighbours, and the average out-degree in 
a region. Moreover, we show how these theoretically derived results about the 
graph properties of the model can be used to formulate a reliable estimator for 
the distance between certain pairs of nodes, and to estimate the density of the 
region containing a given node. 

Keywords — Spatial Random Graphs, Spatial Preferrential Attachment Model, 
Preferential Attachment, Complex Networks, Web Graph, Co-citation, Common Neigh¬ 
bours 


1 Introduction 

There has been a great deal of recent interest in modelling complex networks, a result 
of the increasing connectedness of our world. The hyperlinked structure of the Web, 
citation patterns, friendship relationships, infectious disease spread, these are seemingly 
disparate linked data sets which have fundamentally very similar natures. 

Many models of complex networks have a common weakness: the ‘uniformity’ of the 
nodes; other than link structure there is no way to distinguish the nodes. One family of 
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models which overcomes this dehciency is the family of spatial (or geometric) models, 
wherein the nodes are embedded in a metric space. A node’s position—especially in 
relation to the others—has real-world meaning: the character of the node is encoded 
in its location. Similar nodes are closer in the space than dissimilar nodes. This metric 
space has many potential meanings: in communication networks, perhaps physical 
distance; in a friendship graph, an interest space; in the World Wide Web, a topic 
space. As an illustration, a node representing a webpage on pet food would be closer 
in the metric space to one on general pet care than to one on travel. 

The Spatial Preferrential Attachment Model (^, designed as a model for the World 
Wide Web, is one such spatial model. Indeed, as its name suggests, the SPA Model 
combines geometry and preferential attachment. Setting the SPA Model apart is the 
incorporation of ‘spheres of influence’ to accomplish preferential attachment: the greater 
the degree of the node, the larger its sphere of influence, and hence the higher the 
likelihood of the node gaining more neighbours. The SPA model produces scale-free 
networks, which exhibit many of the characteristics of real-life networks (see BED- 
In 11 , it was shown that the SPA model gave the best £t, in terms of graph structure. 


for a series of social networks derived from Facebook. 

As the motivation behind spatial models is the ‘second layer of meaning’—the char¬ 
acter of the nodes as represented by their positions in the metric space—we hope to 
uncover this layer through examination of the link structure. In particular, estimating 
the distance between nodes in the metric space forms the basis for two important link 
mining tasks: hnding entities that are similar—represented by nodes that are close to¬ 
gether in the metric space—and hnding communities—represented by spatial clusters 
of nodes in the metric space. We show how a theoretical analysis of a spatial model 
can lead to reliable tools to extract the ‘second layer of meaning’. 

The majority of the spatial models to this point have used uniform random distribu¬ 
tion of nodes in the space. However, considering the real-world networks these models 
represent, this concept does not capture the following essential aspect of real-life data. 
Indeed, on a basic level, if the metric space represents actual physical space, and the 
nodes people, then we note that people cluster in cities and towns, rather than being 
uniformly spread across the land. More abstractly, there are more webpages on a pop¬ 
ular topic, corresponding to a small area of our metric space, than for a more obscure 
topic. The development of spatial network models naturally then begins to incorporate 
varying densities of node distribution: both ‘clumps’ of higher/lower density, as well 
as gradually changing densities, are both possibilities. Of the more important goals is 
that of community recognition: the discovery and quantihcation of characteristically 
(semantically) similar nodes. 

In this work we generalize the SPA model to an inhomogeneous distribution of 
nodes within the space. We assume distinct regions of different densities, where the 
dense regions are the ‘clusters’. We find that the local regions behave almost as if 
generated by independent SPA models of parameters derived from the densities. Many 
earlier results from the SPA Model then translate easily to this inhomogeneous version 
and we begin the process of uncovering the geometry using link analysis. 
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In the remainder of this section, we hrst review related work, and then we give a 
formal dehnition of the SPA model. In Section |2] we state our main theoretical results. 
In particular, we give the typical behaviour of the in-degree of a node, and use this to 
derive a relationship between spatial distance and number of common neighbours of a 
pair of nodes. The proofs of the theorems are given in Section 

In Section we verify the asymptotic results from Section through a simulation 
of the SPA model to generate large graphs. Specihcally, we show how the relationship 
between spatial distance and common neighbours can be used to devise a distance 
estimator which gives precise results. We also use the theoretical results to estimate 
the local density around a node. Our simulations show that these estimators give 
reliable results on the simulated data. 


1.1 Background and Related Work 


Efforts to extract node information through link analysis began with a heuristic quan- 
tihcation of entity similarity: numerical values, obtained from the graph structure, 
indicating the relatedness of two nodes. Early simple measures of entity similarity, 
such as the Jaccard coefficient [T^, gave way to iterative graph theoretic measures, in 


which two objects are similar if they are related to similar objects, such as SimRank 14 


Many such measures also incorporate co-citation, the number of common neighbours 
of two nodes, as proposed in the context of bibliographic research in an early paper by 
Small [T^. In [^, the authors make inferences on the social space for nodes in a social 


network, using Bayesian methods and maximum likelihood. 

Generative spatial models were proposed in a more general setting, where the main 
objective was to generate graphs with properties that correspond to those observed in 
real-life networks. Different approaches were explored, for example in using thresh¬ 
olds, or in using a geometric variant of the preferential attachment. Graph prop¬ 


erties of this model were analyzed by Jordan in 15 ; follow-up work on this model can 
be found in fT? . In [Tb], a non-uniform distribution of the points in space is consid¬ 


ered. In [^, Jacob and Morters propose a probabilistic spatial model where the link 
probability is a function decreasing with distance. The setting is general, and includes 
the SPA model as a special case. Follow-up work on this model can be found in [^. 

The SPA model was hrst proposed in as a model for the World Wide Web. 
In and [^, it was proved that the SPA model produces graphs with certain graph 
properties that correspond to those observed in real-life networks. The authors’ previous 
paper, [^, used common neighbours to explore the underlying geometry of the SPA 
model and quantify node similarity based on distance in the space. However, the 
distribution of nodes in space was assumed to be uniform. The approach used in this 
paper is similar to that in 13 , but we investigate the complications that arise when 


the distribution is non-uniform, which is clearly a more realistic setting. 

An earlier version of this work, containing no proofs, was presented at the workshop 
WAW 2013. An extended abstract can be found in |H 
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1.2 The Inhomogeneous SPA Model 

We begin with a brief description of our inhomogeneous SPA model. The model pre¬ 
sented here is a generalization of the SPA model introduced in [^, the main difference 
being that we allow for an inhomogeneous distribution of nodes in the space. 

Let S be the unit hypercube in M'", equipped with the torus metric derived from the 
Euclidean norm, or any equivalent metric. The nodes of the graphs produced 

by the SPA model are points in S chosen via an m-dimensional point process. Most 
generally, the process is given by a probability density function p; p is a measurable 
function such that pdfi = 1. Precisely, for any measurable set A C S' and any t such 
that I <t <n, P(nt E A) = pdp. 

In fact, we will restrict ourselves to probability functions that are locally constant. 
Precisely, we assume that the space S = [0,1)™' is divided into /c™ equal sized hy¬ 
percubes, where k is a constant natural number. Each hypercube is of the form 
Iji X X ■ ■ ■ X (0 < ji, ja, • • •, jm < k), where Ij = [j/k, {j + l)/k). Note that any 
density function p can be approximated by such a locally constant function, so that 
this restriction is justihed. 

To keep notation as simple as possible, we assume that each hypercube is labelled 
TZi, I < i < k^. Let Pi be the density of TZi, so the density function has value pi 
on TZi- For any node v, let TZi^v) be the hypercube containing v, and let p(n) be the 
density of 7Z{v). Clearly, every hypercube has volume k~'^. Then the probability that 
a node Vt, introduced at time t, falls in TZi equals qi = pik~^, and the expected number 
of points in IZi equals = pik~'^n. It is easy to see that — 1- Thus we may 

model the point process as follows: at each time step f, one of the regions is chosen as 
the destination of region TZi is chosen with probability qi. Then, a location for Vt is 
chosen uniformly at random from the chosen region TZi. 

The SPA model generates stochastic sequences for graphs {Gt}t>o; for each t > 0, 
Gt = (yt,Et), where Et is an edge set, and 14 C S' is a node set. The in-degree of a 
node V at time t is given by deg~{v,t). Likewise the out-degree is given by deg^(n,t). 
The sphere of influence S{v, t) of a node v at time t is defined as the ball, centred at v, 
with total volume 

Aideg-{vA) + A2 

^, 

where Ai, A 2 > 0 are given parameters. If (Ai deg“(n, t) + A2)/t > 1, then S'(n, t) = S 
and so |S(n,f)| = 1. We impose the additional restriction that pAi maxj p^ < 1; 
this avoids regions becoming too dense. This property will be always assumed. The 
generation of a SPA model graph begins at time t = 0 with Gq being the null graph. At 
each time step t > 1 (defined to be the transition from Gt-i to Gt), a node Vt is chosen 
from S according to the given spatial distribution, and added to I4_i to form V). Next, 
independently, for each node u E Vt-i such that Vt E S{u,t — 1), a directed link {yt,u) 
is created with probability p, p E (0,1) being another parameter of the model. 

Let 6{v) be the distance from v to the boundary of TZ{v). Let r(n, t) be the radius of 
the sphere of influence of node v at time t. So if r{v, t) < 6{v), then S{v, t) is completely 
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contained in 7l{v) at time t. We see that 


r{v,t) = {\S{v,t)\/cmf^' 


deg (n, t) + A 2 


c^t 


Ijm 


1 


where Cm is the volnme of the nnit ball; for example, in 2-dimensions with the Enclidean 
metric, C 2 = vr. 

As typical in random graph theory, we shall consider only asymptotic properties 
of Gn as n —)■ cx). We say that an event in a probability space holds asymptotically 
almost surely (a.a.s.) if its probability tends to one as n goes to infinity. We emphasize 
that the notations o(-) and 0{-) refer to fnnctions of n, not necessarily positive, whose 
growth is bounded. Since we aim for results that hold a.a.s., we will always assume 
that n is large enough. 


2 Graph properties of the SPA model 

In this section we investigate typical properties of graphs produced by the inhomoge¬ 
neous SPA model, aiming to use the results to infer the spatial distances between the 
nodes. A central observation is that in the inhomogeneous SPA model with a locally 
constant density function, the probability of an edge forming from a new node Vt to an 
existing node v at time t equals 

F({vt,v) e E{Gn)) = p [ pdpi = p^pi\S{v,t) n7^f|. 

^ ^ JSiv,t) f 


In the analysis of the original SPA model from [^, we find that spheres of influence 
of nodes that are born early typically shrink rapidly, while nodes born late start with 
small spheres of influence. A node would have to be quite close to the boundary of 
its region with another one for the effect of any other region to be felt. With this 
assumption, the expression for the link probability is very similar to that of the link 
probability of the original SPA model. Therefore, it seems reasonable to expect that the 
graph formed by nodes in a region TZi with local density pe behaves like an independent 
SPA model of density p^. Our results will show that this expectation is justified and 
can be made rigorous. 

To be specific, assume that nodes in the SPA model do not arrive at fixed, discrete, 
time instances t, but instead arrive according to a homogeneous Poisson process with 
rate 1. (This will not significantly change the analysis but is a convenient assumption.) 
Then, the process inside a region TZ with density p will behave like the SPA model with 
the same parameters Ai, A 2 and p, but with points arriving according to a Poisson 
process with rate p. This means that in each time interval we expect p points to arrive, 
and the expected time interval between arrivals equals 1/p. If we use Vt to denote the 
f-th node arriving, then the arrival time a{t) of Vt is approximately t/p, and thus the 
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volume of the sphere of influence of an existing node v at the time that Vt is born equals 


|^(n,a(t))| 


Ai deg (n, a(t)) + A 2 ^ pAi deg (n, a(t)) + pA 2 
a{t) t 


Thus, in the analysis of the degree of an individual node, we expect a node v in the 
inhomogeneous SPA model to behave like a node in the original SPA model with pa¬ 
rameters p{v)Ai, p{v)A 2 instead of Ai, A 2 , where the degree of node v at time t in 
the inhomogeneous SPA model corresponds to the degree of a node at time a{t) in the 
corresponding SPA model. The following theorems show that this is indeed the case. 


Theorem 2.1. Let oj = u{n) he any function tending to infinity together with n, and 
let e > 0. The following holds with probability 1 — o{n~^). For every node v for which 


deg {v,n) = k = k{n) > ca^logn 


and for which 

r/ N / N f Aik + A2\^^'^ , X / X 

5(n) >(! + £)(- = (1+ e)r(n,n), 

\ J 

it holds that for all values oft such that max{t^,T^} <t<n, 


/ f\ ppiAM 

deg"(n,f) = (l + o(l))/c (- J 


( 1 ) 

( 2 ) 


Times T^ and t^ are defined as follows: 

/ 1 
uj logn \ 


T„ = n 


k 


^1; — (1 + £) 


Aik 




l-pp(v)Ai 


(3) 


Condition Q on 6{v) ensures that at time n, S{v,n) is completely contained in 
7?.(n) (deterministically). In fact, due to the additional multiplicative factor of (1 -l-e), 
S{v,n) is some distance removed from the boundary of TZ{v). The expression for Ty 
is chosen so that at this time node v has a.a.s. at least calogn neighbours. Likewise, 
ty is chosen such that at this time a.a.s. the sphere of influence has shrunk so that 
its radius is sufficiently smaller than 6{v), again with some extra room to spare. The 
implication of this theorem is that once a node accumulates at least u log n neighbours 
and its sphere of influence has shrunk so that it does not intersect neighbouring regions, 
its behaviour can be predicted with high probability until the end of the process, and 
is completely governed by its region, and no others. In particular, it follows that from 
time max{f^,T^} onwards the sphere of influence is completely contained in TZ{v). 

For most vertices, the moment when they hrst achieve a;logn neighbours {Tyf) will 
come before the moment that their sphere of influence has shrunk so that it is well 
contained in the region {ty). Indeed, consider a vertex v of degree at least a;logn for 
which this is not the case. Let T be the moment when the vertex reaches in-degree 
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u log n. By definition, the sphere of influence of v at this time T has a radius of influence 
of order If T calogn, then the radius is o(l), and the probability that 

V is this close to the border is also o(l). The only vertices for which potentially the 
radius at time T could be fairly large are those vertices for which T = 0{u: logn). Thus, 
these are the oldest vertices. These vertices do have high degree, but their spheres of 
influence still tend to shrink over time, so most of their edges will be acquired after 
time t^, that is, when their sphere of influence has shrunk to be contained in the region. 

We can use the results on the degree to show that each graph induced by one of 
the regions TZi has a power law degree distribution. Let Ni{j,n) denote the number 
of nodes of degree j at time n in the region TZi. The proof of the following result is 
a straightforward adaptation of the differential equations method used to prove the 
counterpart result for the uniform model (see [^). Since this theorem is not needed to 
prove the main result of this paper, the proof is omitted here. 

Theorem 2.2. A.a.s. the graph induced by the nodes in region TZi has a power law 
degree distribution with coefficient 1 + l/{ppiAi). Precisely, a.a.s. for any 1 < I < kP 

there exists a constant q such that for any 1 j < j/ = [n/ log®n) '‘ppmax.41+2 ^ 

Ne{j,ri) = (1 + o{l))cej~^^^p^\^n. 

Moreover, a.a.s. the entire graph generated by the inhomogeneous SPA model has a 
degree distribution whose tail follows a power law with coefficient 1 + 1/{ppma^Ai). 

The number of edges also validates our hypothesis that a region of a certain density 
behaves almost as a uniform SPA model with adjusted parameters. In the original SPA 
model with parameters Ai and A 2 replaced by pAi, pA 2 and p, the average out-degree 
is approximately as per Theorem 1.3]. The following theorem shows that the 

subgraph induced by one of the regions has the equivalent expected number of edges. 
This theorem also shows that a.a.s. the number of edges that cross the boundary of a 
region is of smaller order than the number of edges completely contained in that region. 
Thus, almost all edges have both endpoints in the same region. 

Theorem 2.3. A.a.s., for all regions TZi of density p^, \V{Gn) AlZi\ = (1 -|- o{l))qin. 
Moreover, 

E({(m,'u) e E{Gn) \ u,v e TZi}\) = (1 o(l)) q^n. 

1 - ppiAi 

Furthermore, a.a.s. 


|{(M,r;) G E{Gn) : 7^(m) 7 ^ 7^(r;)}| = o(n). 

Here we see that we need the condition ppmaxAi <1. If ppmaxAi > 1, then the 
number of edges would grow superlinearly. 
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Our ultimate goal is to derive the pairwise distances between the nodes in the metric 
space through an analysis of the graph. The following theorem, obtained using the 


approach of 13 , provides an important tool. Namely, it links the number of common 


in-neighbours of a pair of nodes to their (metric) distance. Using this theorem, we can 
then infer the distance from the number of common in-neighbours. 

The theorem distinguishes three cases. If u and v are relatively far from each other, 
then they will have no common neighbours. If the nodes are very close, then the number 
of common neighbours is approximately equal to a fraction p of the degree of the node 
of smallest degree. The third case provides a ‘sweet spot’ where the number of common 
neighbours is a direct function of the metric distance and the degrees of the nodes. For 
any two nodes u and n, let cn{u, v, t) denote the number of common in-neighbours of u 
and V at time t. 

Theorem 2.4. Letu = uj{n) be any function tending to infinity together with n, and let 
e > 0. The following holds a.a.s. Let u and v he nodes of final degrees deg(M, n) = k and 
deg(n,?7,) = j such that IZ = lZ{u) = lZ{v), and k > j > ca^logn. Let p = p{v) = p{u) 


and let T„ = n 


uj logn ^ 

j 


and assume that 


5{v)^ > cj and 6{u)'^ > ck, where c = (1 -|- e) 


^1 




Let d{u,v) be the distance between u and v in the metric space. Then, we have the 
following result about the number of common in-neighbours of u and v: 


Case 1. If 

d{u, v) > e 

then cn{u,v,n) = 0{u\ogn). 


(cnlogn)(fc/j ) 
T 


Case 2. If 


d{u, v) < 






Crrj.Tl 


then cn{u, v, n) = (1 -|- o{l))pj. 
Case 3. If 

Aik + A2Y^'^ fAij + A: 


Cm.^ 


Cm.^ 


then 


< d{u, n) < e I - - - 


l + o(l) + 0 


llm 



cn{u, V, n) = Cjn °‘k^d 


(4) 











where 


and C = pA'^cJ^. 


PpAi 
1 - ppAi 


Note that, if j -C k, then we have a precise asymptotic formula for cn{u,v,n). If 
j and k are approximately equal, then the formula only states that cn{u,v,n) = 
Q{jn-^k^d-^^). 


3 Reconstruction of Geometry 

We set out to discover the character of nodes in a network purely through link structure, 
and to quantify the similarities. Spatial models allow us a convenient dehnition of 
similarity: distances between nodes. In examining the SPA model, the number of 
common neighbours allows us to uncover a good approximation of pairwise distances, 
a first step in the reconstruction of the geometry. 


Description of Model Used: For simulations, we use an inhomogeneous SPA model 
that we call a diagonal layout, which has 4 ‘clusters’ of identical high density, with 
m = 2. In the diagonal layout, k = A and the 4 regions {x,x), 1 < x < A, are dense, 
with the others sparse. We will use ‘dense region’ and ‘sparse region’ to denote the 
union of all regions with densities pd and ps, respectively. For ease of notation, we note 
that ^{Apd + I2ps) = 1, so ps = 4/3 — pd/3. Thus it is enough to provide the value of 
Pd only. In Figure [T] we see an example of the diagonal layout with nodes and edges, 
and we also see evidence that the densest region does dominate the power law degree 
distribution. The yellow line is the prediction for the degree distribution with the power 


law exponent based on the maximum density, as in Theorem 2.2 



Figure 1: Left: diagonal layout, n = 1,000, p = 0.6, pd = 1.6, Ai = 0.7, A 2 = 2.0; 
Right: degree distribution n = 1, 000, 000, p = 0.7, pd = 1.2, Ai = 0.7, A 2 = 1.0. 
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Our estimator for the distance is derived from Case 3 of Theorem |2.4[ and in partic¬ 
ular Equation Q, ignoring the error term. This leads to the following formula for the 
estimated distance d. For a pair of nodes u, n with deg“(M,n) = k and deg“(n,n) = j, 
k > j, whose distance is such that Case 3 applies, this estimate is given by: 


d{u,v) = 


p^Aipk 


ncm,cn{u, n, n)"/ 


(5) 


where 7 = ^ (note that 7 = ^ with a as in Equation (4).) If the density is 
uniform, that is, if p{u) = 1 for all u, then the above estimator is the same as our 
original estimator: Equation (7) from 13 . 

Since a relationship between the spatial distance only exists in Case 3 of Theo¬ 


rem 2.4, we try to eliminate pairs to which one of the other cases applies. Pairs which 
are in Case 1 are very close, and for such pairs, the expected number of common neigh¬ 
bours is pdeg(n, n) = pj. In an attempt to avoid this case, we hiter out all pairs where 
the number of common neighbours is greater than pj/2. Pairs that are in Case 2 are 
so far apart that their spheres of influence have overlap for a very short time, if at all. 
We try to avoid this case by eliminating pairs with 10 or fewer common neighbours. 

To see the effect of the non-uniform density, we hrst apply the original estimator to 
our diagonal layout. In other words, we dehne the estimated distance as in Equation (|^, 
but taking p{u) = 1 for all u, and we are applying this estimator to the points obtained 
from a non-uniform distribution. The motivation of this experiment is that, when 
applying our techniques to real-life data, we are not likely to know the local density 
of a node. Figure (left side) gives the estimated versus real distance for a graph 
with n = 100, 000 nodes, generated via the SPA model from the diagonal layout with 
parameters p = 0.7, pd = 1.6, Ai = 0.7, A 2 = 2.0. After hltering as described above, 
2,270 pairs are left. 

The hgure shows that the approach of assuming uniform density leads to a consistent 
overestimate of the distance for the nodes. This may seem counterintuitive. The trouble 
lies with the estimator’s assumption about a node’s age, which is based on its hnal in¬ 
degree. A node in TZd has more neighbours than is expected when one assumes uniform 
density, and thus the node is thought to be much older than it actually is. This 
confounds the distance estimator. 

Using the same simulation results, we now apply the estimator from Equation (|^, 
and use our knowledge about p{u). The hgure on the right in Figure shows the 
estimated distance using Equation g vs. actual node distance. The results indicate 
that our new estimator is signihcantly more accurate in predicting distances for the 
pairs of nodes in the dense region. 

Let us mention that the estimation for pairs in the sparse region is still not accurate, 
while the estimation for cross-border pairs appears to be even worse. This is likely 
caused by the fact that nodes that are involved in cross-border pairs, and in sparse 
region pairs, and that have enough common neighbours to qualify to be included, are 
likely the older nodes, i.e. they are born near the beginning of the process. For such 
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Figure 2: SPA model, n = 100, 000, diagonal layout, p = 0.7, pd = 1.6, Ai = 0.7, A 2 = 
2.0, actual vs. estimated distances for pairs of nodes; Left: using original estimator; 
Right: using new estimator, density known 


nodes, in the early stages there is likely some overlap between their sphere of influence 
and the bordering, dense regions. Thus, the degree likely does not follow the prediction 
from Theorem 2T, which, in turn, affects the performance of the distance estimator for 
those pairs. 

Better performance for cross-border pairs could possibly be obtained by using a 
linear combination of the densities in Equation ([^. However, we will see in what 
follows that better performance for all pairs occurs when we use the data itself to 
estimate the density. Also, we point out that the pairs in the dense region constitute 
the large majority of all pairs. Moreover, the dense regions are those that are most 
likely to correspond to communities of interest. Therefore, accurate prediction for pairs 
in these regions is most important. 


Estimating the density: In real-world situations, we cannot assume to know the 
density of the region containing a given node. In fact, the density of the region con¬ 
taining a node is an important part of the ‘second layer of meaning’ which we aim to 
extract from the graph. Here we will show that our theoretical results give us a tool 
for estimating the local density around a node, using only its neighbourhood. We also 
apply our distance estimator once more, this time using the estimated density for our 
formula. 

Using the theoretical results obtained from the previous section, we can estimate the 
density of the region 77(n) containing a given node v from the average out-degree of the 
in-neighbours of v. As per Theorem |2.3[ the average out-degree in TZ^ is approximately 

PP1A2 
1 - pp^Ai ■ 

If we have a large enough set of nodes from the same region, then we can use the 
formula above to estimate the density of the region. Consider a node u, and make 
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two assumptions: {i) almost all neighbours of v are contained in TZ{v), and {ii) the 
neighbours of v form a representative sample of all nodes of TZ{v). Simulations show that 
these assumptions are justihed and allow us to make an estimate for p{v). Assumption 


(i) is additionally justihed by the second part of Theorem 2.3, which states that the 


number of edges crossing the border is negligible compared to the total number of edges. 
Set deg (u) to be the average out-degree of the in-neighbours of v. Specihcally, 


deg (u) = 


1 


deg (u) 


deg+(M). 


uGN~ (h») 


Given our assumptions, an estimator for the density in 7l{v), denoted by p{v), can be 
derived from this average out-degree, using Equation ([^: 


p{v) = 


deg 


pA2 -h pAideg (v) 


where N (v) is the set of in-neighbours of v. 

The left side of Figure]^ shows a histogram of the values of deg^(u) for our simulated 
graph. Displayed are the results for nodes with deg“(u) > 10. The graph is obtained 
from the SPA model where points have the previously described diagonal layout, with 
density pd = 1.6 in the dense region, and consequently density ps = 0.8 in the sparse 
region. For these parameters. Equation (6) gives a theoretical value of 5.85 for deg (u) 
if node v lies in the dense region, and a value of 1.45 if v lies in the sparse region. 

We see in Figure (left side) that the values of deg^(u) in the dense region are 
quite accurate, with peaks occurring around the calculated value of 5.85. For the sparse 
region, the peaks occur around 2.5, giving an estimate for the average out-degree which 
is higher than expected. Likely, this is caused by nodes in the sparse region that are 
located close to the border, and thus are likely to have neighbours in the dense region. 
Such nodes also tend to have high degree, and our condition on the minimum degree 
favours the ‘rich’ sparse region nodes. 

Figure]^ (right side) gives a histogram of the estimated densities of the nodes. For 
nodes in the dense region, the true value is 1.6, and we see a good estimation of this 
value for these nodes. For nodes in the sparse nodes, the true value is 0.8, while the 
peak of the estimated densities occurs around 1.15, and almost all values are greater 
than 0.8. Again, this is likely caused by nodes whose sphere of influence overlapped 
with the dense region. 

To obtain better performance for nodes in the sparse region, we propose to base our 
estimated density for a node v in the sparse region only on the out-degree of neighbours 
of V of low in-degree; such neighbours are young and so the sphere of influence of v 
had shrunk, and thus was more likely to be fully inside the sparse region, when the 
neighbours were born. To obtain density estimates for nodes with small in-degree, we 
can take the second neighbourhood to compute the average out-degree. Nodes with 
small in-degrees are young, so even second neighbours are likely to be close. We plan 
to explore these possibilities in future work. 
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Figure 3: Diagonal layout, n = 100,000, pd 
Left: average out-degree of the in-neighbours; 
out-degree. 


= 1.6, Ai = 0.7, A 2 = 2.0, p = 0.6; 
Right: calculated density from average 


Finally, we use p, and known values of all other parameters, to calculate the distance 
between the nodes based on the number of common neighbours. Equation (|^, using 
the same simulation results as those we used earlier. Here we use the calculated density 
of the node of higher degree in the distance formula. (Using the lower degree node gives 
similar results.) The results are seen in Figure]^ 

The hgure shows that there is very good agreement between calculated and esti¬ 
mated densities. In fact, we see that the agreement is greatly improved for the cross- 
border pairs, and also better for the sparse region pairs. This can be understood as 
follows. The distance estimator is derived indirectly from Theorem 4T, which predicts 
the approximate degree of a node throughout the process, based on its hnal degree and 
the density of its region. For nodes in the sparse region which have a sizeable number 
of neighbours in the sparse region, the degree will be larger than predicted using this 
method but also, the density estimator will predict a higher density. So the estimated 
density is a better indicator of the behaviour of the degree than the real density, and 
thus the distance estimator gives better performance. This indicates that this last vari¬ 
ation of the distance estimator is the most robust against local fluctuations in density. 
Thus we have a good prognosis for the applicability of the estimator on real data, where 
such fluctuations are to be expected. 


4 Proofs 

In this section, we give the proofs of the main theorems. Our results all refer to typical 
behaviour of the random SPA model process, and are asymptotic in n, the number of 
vertices. We will sometimes use the stronger notion of w.e.p. in favour of the more 
commonly used a.a.s., since it simplihes some of our proofs. We say that an event holds 
with extreme probability (w.e.p.), if it holds with probability at least 1—exp(—a;(n) logn) 
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Figure 4: Diagonal layout, n = 100,000, pd = 1.6, Ai = 0.7, A 2 = 2.0, p = 0.7: 
Distance estimation using estimated density from the node of greater final degree, all 
other parameters known 


as n —)■ cx), where u:{n) is any function tending to inhnity together with n. Thus, if we 
consider a polynomial number of events that each holds w.e.p., then w.e.p. (and hence 
also a.a.s.) all events hold. 

First we state and prove a theorem that bounds the in-degree of any node, regardless 
of its distance of the boundary. 

Theorem 4.1. Let u = u{t) be any function tending to infinity together with t. The 
expected in-degree at time t of a node Vi born at time i >oj satisfies 

A / f\ PPmax^l A 

E{deg-{v,,t)) < + 


E(deg (ui,f)) > (1 + 0 ( 1 ))^ 


p(2-^p{v) + (l-2-^)p,nin)Al 


A 2 


Moreover, for any node Vi born at time i > 1 we have 

E{deg~{vi,t)) < 


ekla A2 


^1 




Proof. In order to simplify calculations, we make the following substitution: 

Y{vi,f) = deg~{vi,f) + ^ = 

It follows immediately from the dehnition of the process that Y{vi,i) = A 2 IA\ and for 
t > i 


Y{vi,t + 1 ) = 


Y (uj, t) -|- 1, with probability at most ppma^-fY {vi, t), 
Y{vi,t), otherwise. 
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We couple V(vi, t) with another random variable X{vi, t) so that Y (uj, t) < X{vi, t) 
for t>i. Random variable X{vi,t) is dehned as follows: X{vi,i) = A 2 /A 1 and for t > i 


X{vi,t + 1) = 


X{vi, t) + 1, with probability ppma,x^X{vi, t) 


X{vi,t), otherwise. 
Finding the conditional expectation, 

E(X(uj,t + l) I X{vi,t)) = X{vi,t) {1 + 


PPmaxAi 

t 


Taking expectations, we get 

E{X{vi,t + 1)) = E{X{vi,t)) ( 1 + 

and, since X{vi,i) = A 2 /A 1 , 

PPraaxAi 


PPiQa.xA\ 

t 


E(x(,..,i))=Tn(i+ 


j=i 


j 


't-i 


exp ^ 


,j=i 


PPmaxAi 

j 


A, 

A 2 

< — 

Ai 

< 4^ exp ( pp 

max ( log ( ; ) + l/i 

^ 6^2 / 1 


\i 


PPmax-^1 


If i > cu, we have 


E(X(ui,t)) = (1 + 0 ( 1 ))^ exp PPmjAi j ^ 


PPmax-^1 


This shows the upper bound. 

For the lower bound, we hrst observe that for all nodes u, \S{v,t) H 7^(u)| > 
(l/2)”^|5'(u,f)|. Thus the node Vt links to Vi with probability at least 

p(2“”"p(u) + (1 - 2“”")pmin)|*S'(Ui,t)|. 

Using this, we can use the exact same approach to bound the expectation of Y{vi,t) 
from below. This gives the lower bound. □ 
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Proof of Theorem 12.II 


Here we show that, once a vertex has reached an in-degree of culogn and its area of 
influence is well contained within the region, its degree can be closely predicted with 
high probability. We will be using the following version of the Chernoff bound, as seen 
p. 27, Corollary 2.3]. 


in e.g. 


10 


Lemma 4.2. Let X be a random variable that can be expressed as a sum of independent 
random indicator variables, X = where Xi G Be(pi) with (possibly) different 

Pi = P(W = 1) = EXj. If e < then 


P(|X-EX| > eEX) < 2exp 



(7) 


Let us start with the following key lemma. 

Lemma 4.3. Let oj = uj{n) be any function tending to infinity together with n, and let 
e > 0. For a given node v, suppose that deg~{v,T) = d> uXogn and that 

7 I 4 \ 1/m 

=i^+^y{v,T). 

J 

Then, with probability 1 — Oin~^), for every value of t, T <t < 2T, 

/ f\ 

deg~{v,t) -d - i-j 

Proof. Let p = p{v). Our goal is to estimate deg~{v,t) — d ■ [t/TY^^^. We will show 
that the upper bound holds; the lower bound can be obtained by using an analogous, 
symmetric, argument. Note that the assumption on 5{v) implies that S{v,T) C IZ{v). 
We use the following stopping time 

f f t\ 3 t 

To = min t > T : deg"(n,t) > d ■ f-j + ■ — V^dlogn V t = 2T + l 

Note that if Tq = 2T -|- 1, then the in-degree of v remained bounded as required during 
the entire time interval T < t < 2T. Hence, in order to prove the bound, we need to 
show that with probability 1 — 0{n~Y we have Tq = 2T -|- 1. 

Suppose that Tq < 2T. Note that for t > T up to and including time-step Tq — 1, 
the random variable deg~(n,t) is (deterministically) bounded from above. Moreover, it 
is straightforward to see that this upper bound, together with the assumption on 5{v) 
(note the additional multiplicative (1 -)- e) term), implies that S{v,t) C TZ{v) for all 
T < t < Tq — 1. Hence, the number of new neighbours accumulated during this phase 



< . ■ ^Vdlogn. 

pp{v)Ai T 
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of the process, deg {v,Tq) — d, can be (stochastically) bounded from above by the sum 
X = of independent indicator random variables where 

(d ^ ■ ^y/dlogn^ + A2 

P(Xt = 1) =pp --. 

Clearly, since ppAi < 1, 


To-l 

EX = EXt 

t=T 


^To-1 


= ppAidT-PP^^ Y + 


To-T 


. t=T 


T 


3\/d logn + 0(1) 






Y\ Y _ Y 


+ 


T 


3^/d\ogn + 0(1) 


Tn-T 


d H— 'Y —3 v^dlog n + 0(1). 


T 


Since Tq < 2T, the in-degree of v at time Tq failed the desired condition, which implies 
that 

X > deg"(n,To)-d 

= EX H-^ ■ ^Vdlogn - '^ 3^/d\ogn + 0(1) 

ppAi T T 

> EX -|- Sy/dlogn, 

using again that it is assumed that ppAi < 1. It follows from the Chernoff bound ([^ 
that 

P(|X — EX| > 3^/d logn) < 2exp e-^/dlo^j , 
where £ = 3y'd\ogn/EX. The maximum value of EX corresponds to Tq = 2T and so 


YY \ _ T 


EX < d^—J -d+ — 

= d(2^’^^i-1)(1 + 0(1)) 

< d. 


2\/ d logn + 0(1) 


So £ > 3y^d”^ logn. Therefore, the probability that Tq < 2T is at most 2exp(—31ogn) 
and the proof is hnished. □ 
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Now, with Lemma 4.3 in hand we can get Theorem 2.1 


Proof of Theorem 2^. Let oj = uj{n) be a function going to infinity with n, and let 
e > 0. Let n be a vertex with hnal degree k > u logn, let p = p(v), and assume that 5 = 
5{v) > (l + e)r(n, n). Let Ty be the hrst time that the in-degree of v exceeds {oj/2) logn, 
and iy be the hrst time t that the radius of inhuence r{v,t) < 5(1 -f 
Moreover, let T = max{T^,t^} be the hrst time that the two events hold. Finally, let 
d = deg“(n,T). We obtain from Lemma 4.3 that, with probability 1 — 0{n~^), 




\ ppAi 


^^^Vc^-Mogn^ < deg {v,t) < d 


ppAi 


1 + 


ppA 


-\/d ^log n 


for T < t < 2T. It follows that the degree tends to grow but the sphere of inhuence 
tends to shrink between T and 2T, and thus that the conditions of Lemma [473 again hold 
at time 2T. We can now keep applying the same lemma for times 2T, 4T, 8T, 16T,..., 
using the hnal value as the initial one for the next period, to get the statement for all 
values of t from T up to and including tim e n. Precisely, for 1 < i < Vax = 
let di = deg“(r’, 2iT). Then by Lemma 4.3, we have for i > 1 that di < di-i2^P^^ (1 

3 


where = 


ppAi\J Since we apply the lemma O(logn) times (for a given 

vertex n), the following statement holds with probability 1 — o(n“^) from time T on: 
for any 2*“^T <t< 2*T, we have that 


deg {y,t)<d[^ 


ppAi 


]^(1 + Si ). 

j=0 


It remains to make sure that the accumulated multiplicative error term is still only 
(1 -|- o(l)). For that, let us note that 


]^(1 + £ i ) 
j=0 


TT ( 1 + -^^/d-^2-PP^P\og 

fJl V PP^i 


n 


= (l-ho(l))exp 


ppA 


-\/d~^ logn'y~^2 


-ppAijl2 


i=i 


(1 -h o(l)) exp (^0{^/d-^ logn) 
1 -I- o(l). 


since d grows faster than logn. A symmetric argument can be used to show a lower 
bound for the error term and so the result holds. 

It follows that we have the desired behaviour from time T. Precisely, for times 
T < t < n, we have that 


deg {v,t) =d{^ 


PpAi 


(1 + 0 ( 1 )), 
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where d = deg~{v,T) > deg~{v,Ty) > (a;/2jlogn. As T = ma.x{Ty,ty}, we need to 
consider two cases. Snppose first that T = Ty. Setting t = n and deg“(n,n) = k, we 
obtain that 


r = (1 + 0(1)) 



l/ppAi 

n = 


(1 + 0 ( 1 )) 


fuj log n \ 

(—j " 


(1 + 0 ( 1 )) 



T 

J. y. 


Therefore, for large enongh n, we have that T < Ty < ma.x{ty,Ty}. Snppose then that 
T = ty. By dehnition, 

r{v, T) = (1 + o(l))5(l + e/ 2 )-d-PP+)/"* 


and, since d > calogn, 

r{v,T) = (l + o(l))(^^j 

/ A^U \ Vm 

= (l+°(l))( „„p-+.o.M. ) . 

and so 

T = (1 + o(l))(l + 0/2) = (1 + 0(1)) ^ V 

Again, for large enough n, we have that T < ty < niax{t.y, T^}. In either case, T < 
max{f^,Tu}. As a result, we obtain that, for max{f^,T^} <t<n, 

/ f\ 

deg-{v,t) = k (l + o(l)). 

Finally, since the statement holds for any vertex v with probability 1 — o(n“^), with 
probability 1 — o{n~^) the statement holds for all vertices. The proof of the theorem is 
hnished. □ 


Let us note that Theorem 2.1 immediately implies the following two corollaries. 


Corollary 4.4. Let uj = u{n) be any function tending to infinity together with n, and 
let e > 0. The following holds with probability 1 — o{n~^). For every node v, and for 
every time T so that deg~{v,T) > calogn and (1 + e)r{v,T) < S{v), for all times t, 
T < t < n, 

/ + \ pp(«)^i 

deg"(n,t) = deg"(n,T) f - j (l + o(l)). 


Corollary 4.5. Let u = u{n) be any function tending to infinity together with n. The 
following holds with probability 1 — o{n~^). For any node Vi born at time i > 1, and 
i < t < n we have that 


deg{vi,t) < u logn 


Ppmax-^1 


( 8 ) 
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Proof. The statement is trivially true if deg(uj, t) < u logn so we may assume that t is 
such that deg(ui,t) > calogn. Let T be the hrst time that the in-degree of v exceeds 
(a;/2)log?7,; clearly, deg“(u,T) = (1/2-1-o(l))a;logn. First assume that {l + e)r{v,T) < 
5{v) for some e > 0. By Corollary 4.4| 


deg~{v,t) = deg~{v,T) ( - 


where the last step follows since we may assume that n large enough so that the (l-|-o(l)) 
term is less than 2, and the fact that T > i and p < pmax- 

However, even if v is close to the border of the region, that is, (l-|-o(l))r(u, T) > S{v), 
this argument applies. Namely, in this case the degree of v is stochastically bounded 
above by the degree of a node with the same birth time, but born in a region with 
density pmax, and position far from the border. Therefore, the argument above still 
applies. □ 


Proof of Theorem 2.3 


Let Zj be the indicator variable of the event {vj G 'R-g}. By dehnition of the process, 
Zj is a Bernouilli variable with expectation and the variables Zj are independent 
for different values of j. Thus, \V{Gn) d\lZ(\ = hie sum of n independent 


binomial random variables. Thus E(|1/(G„)nT^-^l) = and it follows from Lemma 4.2 
that for every £, a.a.s. \\V{Gn) P'Ri\ — qin\ = Oiuy/n) = o(n), where uj = uj{n) is any 
function tending to inhnity together with n. Since the number of regions is assumed to 
be a constant independent of u, the desired property holds a.a.s. for all regions. 

For the second statement, we examine the number of edges whose endpoints are in 
the same region, that is, the number of edges that do not cross a boundary. We set M/ 
to be \{{u,v) G E{Gt) \ u,v G 'Ri}\- We create the indicator variable Xjj, such that 


= 


1, if Vi, Vj G TZg and (uj, Vj) G E{Gn) 
0, otherwise. 


Thus, 


E(Ml,\G,) ^ Ml + Y. E(-V+w|G,) 

= ^ ppi\s{vj, t) n TZil 

= I ^ h) I - (9) 

where 

Vj&Tli ,j<t 
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Using the definition of S{vj,t), we obtain 


E(Af,‘^i|G,)=Af,'+ Y, fO t 

Vj&Tlt ,j<t 

r£ , + , PPe ^2 


Aideg {vj,t) + A2 g 


= Mf + 


+ 


ppiY: 


|U^7^,|-pp,u/, 


( 10 ) 


where 


= |{(n,n) e E{Gt) \ven,,u^ TZ,}]. 


Let a = ppmaxAi- By Corollary 4.5, for any function u: = uj{n) tending to infinity 
together with n, with probability 1—o(n“^), for all vertices Vi and for all times i <t < n, 


deg {vi,t) < wlogn t 

i 


Let Q be the event that we have these upper bounds for all vertices Vi and for all times 
i <t < n. 

Assume first that Q holds. It follows (deterministically) that for all vertices Vi and 
for all times i <t < n, 


and thus 


|5(ni,t)| 


Aideg {vi,t) + A2 
t 


< (a;^logn)f“ 


r{vi,t) < < (a;^ log^/”" 

as usual, assuming that n is large enough. Let X* be the indicator variable of the event 
{S{vi,t) 2 'TZi}. Using the bound on |S'(nj,f)|, we obtain 

Yl < < 5^X,(n;2logn)t“-'*-“. 

i<t i<t 

If Xj = 1 then Vi has distance at most (ca^ log^'^”^ from the boundary of 

Since the length of the boundary of Tie, is at most 4, the boundary strip in which 
Vj must be located has area at most (4a;^ log^'^™'Thus 

E(Xil^) < (4n;3log^/™n)f(“-^)/™r“/”^. 

Combining this with the bound on U/, we obtain 

E(U/|^) < ^(4a;3log^/"*n)f(“-i)/"^i-"/™(a;2 logn)t“-^r“ 

i<t 

= {Auj^ \og^+Ym i-Al+^) 

i<t 

< c{uj^\og^+Y^n)t-^, 
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where /? = min{^, (1 — a)(l + ^)} and c is an appropriate constant independent of t 
and n. Note that /3 > 0, and thns, for t > log^^^n, the expectation of Y/ is o(l) and so 
negligible (as u can be tending to inhnity arbitrarily slowly). 

Next, consider the nnmber of cross-border edges, Zf. At time t, an edge from vt 
to Vi, i < t, which contribntes to Zl can only be created if Vt G S{vi,t) \ IZ^. The 
probability that snch an edge is created is at most ppmax|*S'(ni, f) \ Thns we have 
that 

nzl\zU) = zU + PPrn..\s{v,,t)\n,\ = zl, + pp^^X- 

Vi&Tli 


Therefore, 

t 

nzi\Q) = < c'(log"n)f'-^ 

r=l 


for some constant d independent of t and n. 

By taking the expectation of (10) and setting m[ = E(M/|^), we obtain the following 
recnrrence for the (conditional) expected valne ml and t > log^’'"^^^ n, 


mf+i = ml(^l + + ppiqiA2 + o(l). 


( 11 ) 


To solve this recnrrence, we nse the following lemma on real seqnences, which is 
Lemma 3.1 from [^. 

Lemma 4.6. If (at), {(3t) and (y*) are real sequences satisfying the relation 


Oit+l 



at + It, 


and lim^^oo A = /9 > 0 and limt_5.oo 7t = 7, then limt^oo ^ exists and equals 

Using this lemma, we see that lim^^oo ^ the (hrst-order) 

solntion of the recnrrence 0 is: 

= (1 + o(l)) . q^n. (12) 

1 - ppeAi 

Here we nse that ppiAi < 1 as given. Note that the o(l) term in the recnrrence and 
the lower bonnd on t only affect the (1 -|- o(l)) term of the solntion. 

Finally, since P(^) = o{n~^) and (deterministically) < (2), we get 

E(MS = P(C)m' + P(7E(M'|C) = (1 + o(l))mi + o(n) = (1 + o(l))-?^d2-„„, 

1 - ppeAi 

This completes the proof. 
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Proof of Theorem 


2.4 


Fix cj and e as in the statement of the theorem. Let u and v be nodes of final degrees 
deg“(M, n) = k and deg“(n, n) = j such that k > j > logn, where both u and v are 
located in a region IZ with density p. Let and be defined as in ([^. Assume that 
the conditions of the theorem hold, that is, 

> c] and > ck, where c={l + e)( -1 • (^3) 

Since = o{n), it follows that 


6{v) >(! + £:) 




Cm.Tl 


and 5{u) > (1 + e) 


A,k + A,V/^ 


Cm.Tl' 


Therefore, the conditions of Theorem 2.1 are satisfied. Thus, from time max{T^,t^} 
until the end of the process we have concentration of the degree of node n, as given 


by 


(13 


2). Recall that ty = (1 + e) 
, we obtain that 


Mj 

CmnPP^-i-S{v)'^ 




as in (3). Rewriting condition 


T > (A j_ ;rR/(i-PP^i) ( ^ 

^ \cmnPP^^5{v)^) 


l/(l-PP-4i) 


> ty 


Thus max{T^,t.y} = Ty and so, in fact, the degree of v is concentrated from time Ty 
until time n. 

Similarly, let ty and T„ be dehned as in ([^. The argument above, with j replaced 
by k, shows that max{tu,Tu} = Ty. By definition, as k > j, it follows that < Ty. 
Hence, by Theorem m we have concentration of the degrees of both u and v from 
time Ty on. Precisely, we have that, with probability (1 — o{n~^)), for all pairs u and 
V satisfying the conditions as stated above, and for all times t, Ty < t < n, 


^ \ ppM f t\ 

deg“(M,t) = (1+ o(l))/c (-J and deg"(n, t) = (1 + o(l))j f - J . (14) 


The rest of the proof proceeds as the proof of Theorem 3.1 in 13 


Case 1. By (14) above and the definition of Ty and T„, we have that deg“(n,Tt,) = 
(1 + o(l))a;logn, and deg“(M,T^) = (1 + o{l)){u}\ogn){k/j). This implies that the 
radius of influence of u satishes: 


r{u,Ty) = 0 


(g; logn)(fc/j )y^"' 

Ty 


The condition assumed in this case is that 

d(M, v)>e 
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Thus, at time T^, both radii r{u,T^) and r{v,Ty) are 0{d{u,v)). Moreover, both radii 
decrease (in order) from time onwards, whereas d{u, v) is independent of time. Thus 
there must exist a constant c (depending on e but not on n) such that, for all times f, 
cT^ <t<n, the areas of influence of u and v are disjoint. Then the total number of com¬ 
mon neighbours could only be, at most, min{deg“(n, cT^,), deg“(n, cT^)} = 0(a;logn). 


Case 2. Suppose d satishes 

'A.k+A^y^^ 


d{u, v) < 


Cm.T^ 




Cm.Tl 


= r(n, n) — r(n, n). 


Note that this condition implies (deterministically) that at time n the sphere of influence 
of V is contained in the sphere of influence of v. We now see that this situation occurs 


in approximate form throughout the process. From (14), we see that, for Ty < t < n, 
deg{v,t) = (1 + o(l)) (I) deg{u,t), and thus r{v,t) = (1 + o(l)) r(M, f). 

If j < ak for some constant a G (0,1), then we have that the difference between 
r{v,t) and r{u,t) behaves as (1 -|- o{l))cr{u,t), where c = (i) ™ < 1. In 

particular, this implies that this difference tends to shrink over time. Thus we have 
that, for Ty < t < n, 

d{u, v) < r{u, n) — r{v, n) < (1 -|- o(l))(r(M, t) — r{v, t)). 

Therefore, all but a negligible fraction of the sphere of influence of v lies inside the 
sphere of influence of u during the whole process. 

If /c = (1 -I- o(l))j, so (f)^^™ = 1 + o(l), the difference between r{v,t) and r{u,t) is 
o{r{u, t)) for Ty < t < n, and specifically at time t = n. From the condition of this case, 
we know that at time n, S{v,n) is completely contained inside S{u,n). This, combined 
with the fact that r{v,t) —r{u,t) = o{r{u,t)) implies that the spheres of influence of u 
and V overlap in all but a negligible part during the entire process. 

Any vertex w that links to v must he inside the sphere of influence of v. Since most 
of the sphere of influence of v is contained in that of u in both cases mentioned above, 
this means that it is likely that w lies inside the sphere of influence of u as well, and 
thus has a probability p of also linking to v. Accounting for the small variation in the 
size of the spheres of influence, we have that the probability that a neighbour of u, 
added between times Ty and n, is also a neighbour of n is (1 -|- o(l))p. The number 
of common neighbours accumulated until time Ty is at most deg(M,T^) = 0{u\ogn), 
which is of smaller order than j, the final degree of u. 

Therefore, E,cn{u,v,n) = (1 -|- o{l))pj. Finally, note that the number of common 
neighbours is a sum of independent random indicator variables with Bernouilli distribu¬ 
tion; each variable corresponds to a situation when a neighbour of v falls into a sphere 
of influence of u. It follows that cn{u,v,n) G Bin((l -|- o{l))j,p). The concentration 


follows from the bound (4.2). 
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Case 3. Suppose that d = d{u,v) satisfies 

r{u, n) — r(u, n) < d{u, v) < e = (1 + o{l))eA[^^'^r{u, T^). 

The analysis for this case is based on the assumption that, at time T^, the sphere of 
influence of v is contained in that of u, and at time n the spheres are disjoint. Thus, in 
some narrow time interval around a time to between times Ty and n, the spheres become 
separated, and after this no more common neighbours can be formed. However, the 
conditions of this case do not guarantee that we have this situation; it may be that the 
conditions hold, but the sphere of influence of v is not completely contained in that of 
u at time T^, or that the spheres are not completely disjoint at time n. However, the 
conditions guarantee that the asymptotic behaviour still applies in this case. 

Let t~ be the first time instance after Ty when S{u,t) is not completely contained 
in S{v,t). Let be the last time when the spheres overlap, or t~^ = n if the spheres 
overlap at time n. (So Ty <t~ < T*" < u). From time Ty until time each neighbour 
of u will be a common neighbour of v and u with probability p. From time T*" to n, 
no common neighbours can be created. From time t~ until time t^, the probability 
that a neighbour of u becomes a neighbour of v is at most p. Thus, pdeg~{u,t~) and 
pdeg~ form a lower and an upper bound, respectively, on the expected number 
of common neighbours of u and v. 

Since the centres of S{u,t) and S{v,t) are at distance d from each other, the defini¬ 
tion of t~ and T*" translate into the following conditions on the radii of the spheres of 
influence: 


r{v,t )-r{u,t ) = (l-ho(l))d, 
r{v,t~^) + r{u,t'^) = (l-|-o(l))d. 

(The factor (1 -|- o(l)) is caused by the fact that the spheres of influence increase or 
decrease in discrete amounts.) 

Using Equation (14) for the degree of u and v from time Ty, and translating this 
into conditions on the radius of the sphere of influence, we obtain 

r(u, t~) — r{u, t~) 


( 


= (1 + 0 ( 1 )) 
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= (1 + 0 ( 1 )) 

A similar argument shows that 
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Define to = {t~^ + t )/2. Then, by the above, we get that t = to(l — 
t+ = to{l + 0{{j/kY^”")), and 


to 


(1 + 0 ( 1 )) 



1 

1-ppAi 


It follows from the discussion from Case 2 that a.a.s. the number of common neigh¬ 
bours of u and v is bounded from below by (1 -|- o(l))pdeg“(M, t~), and from above by 
(1 -|- o(l))j9deg“(n, t~^). Using our knowledge about the behaviour of the in-degree of u 


given by (14), this leads to the following expression: 


/i \ppM 

cn{u,v,n) = pj i—j ( 1-I-o(l)-I-O 



,_ f Ai ^ 

= pjn —kd 

V Om 
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-1 'l + o(l) + o|(| 
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This concludes the proof. 
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